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Abstract 



Copula models have become one of the most widely used tools in the applied modelling 
of multivariate data. Similarly, Bayesian methods are increasingly used to obtain efficient 
hkehhood-based inference. However, to date, there has been only hmited use of Bayesian 
approaches in the formulation and estimation of copula models. This article aims to address 
this shortcoming in two ways. First, to introduce copula models and aspects of copula theory 
that are especially relevant for a Bayesian analysis. Second, to outline Bayesian approaches 
to formulating and estimating copula models, and their advantages over alternative methods. 
Copulas covered include Archimedean, copulas constructed by inversion, and vine copulas; 
along with their interpretation as transformations. A number of parameterisations of a 
correlation matrix of a Gaussian copula are considered, along with hierarchical priors that 
allow for Bayesian selection and model averaging for each parameterisation. Markov chain 
Monte Carlo sampling schemes for fitting Gaussian and D-vine copulas, with and without 
selection, are given in detail. The relationship between the prior for the parameters of a 
D-vine, and the prior for a correlation matrix of a Gaussian copula, is discussed. Last, 
it is shown how to compute Bayesian inference when the data are discrete-valued using 
data augmentation. This approach generalises popular Bayesian methods for the estimation 
of models for multivariate binary and other ordinal data to more general copula models. 
Bayesian data augmentation has substantial advantages over other methods of estimation 
for this class of models. 
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1 Introduction 



Copula models are now used widely in the empirical analysis of multivariate data. For 
example, major areas of application include survival analysis, where much early work oc- 
curred (Clayton 1978; Oakes 1989), actuarial science (Frees and Valdez 1998), finance 
(Li 2000; Cherubini, Luciano and Vecchiato 2004; McNeil, Frey and Embrechts 2005), mar- 
keting (Danaher and Smith 2011), transport studies (Bhat and Eluru 2009; Smith and 
Kauermann 2011), medical statistics (Lambert and Vandenhende 2002; Nikoloulopoulos and 
Karlis 2008) and econometrics (Smith 2003; Cameron et al. 2004; Patton 2006). Copula 
models are popular because they are flexible tools for the modelling of complex relationships 
between variables in a simple manner. They allow for the marginal distributions of data to 
be modelled separately in an initial step, and then dependence between variables is captured 
using a copula function. 

However, the development of estimation and statistical inferential methodology for copula 
models has been hmited. Most research has either been focused on the development and 
properties of copula functions (see Joe 1997 and Nelsen 2006 for excellent overviews), or 
their use in solving applied problems. Less attention has been given to the question of how 
to estimate the increasing variety of copula models in an effective manner. To date, the most 
popular estimation methods are full or two-stage maximum likelihood estimation (Joe 2005) 
and method of moments style estimators in low dimensions (Genest and Rivest 1993). There 
has been only limited work on developing Bayesian approaches to formulate and estimate 
copula models. This is surprising, given that Bayesian methods have proven successful in 
both formulating and estimating multivariate models elsewhere. The aim of this article is 
two-fold: (i) to introduce contemporary copula modelling to Bayesian statisticians, and (ii) to 
outline the advantages of Bayesian inference when applied to copula models. Therefore, there 
are two intended audiences: (i) Bayesians who are unfamiliar with the advances and features 
of copula models, and (ii) users of copula models who are unfamiliar with the advantages 
and features of modern Bayesian inferential methods. 

Previous Bayesian work on copula modeUing includes that of Huard, Evin and Favre (2006), 
who suggest a method to select between different bivariate copulas, and Silva and Lopes (2008) 
who use Markov chain Monte Carlo (MCMC) methods to estimate low dimensional paramet- 
ric copula functions. Pitt, Chan and Kohn (2006), Hoff (2007) and Danaher and Smith (2011) 
estimate Gaussian copula regression models using MCMC methods. Note that adopting 
a Gaussian copula does not mean the data are normally distributed. Smith, Can and 
Kohn (2010b) extend the work of Pitt, Chan and Kohn (2006) to copulas derived by in- 
version from skew t distributions constructed by hidden conditioning. Smith et al. (2010) 
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and Mill and Czado (2010; 2011) propose methods to estimate so called 'vine' copulas with 
continuous margins using MCMC Pitt, Chan and Kohn (2006) show how Bayesian covari- 
ance selection approaches can be used in Gaussian copulas, while Smith et al. (2010) and 
Min and Czado (2011) also show how Bayesian selection ideas can be applied to determine 
whether, or not, the component 'pair-copulas' of a vine copula are equal to the bivariate in- 
dependence copula. Smith et al. (2010) also show that the D-vine copula provides a natural 
decomposition for serial dependence. Ausin and Lopes (2010) consider Bayesian estimation 
of multivariate time series with copula-based time varying cross- sectional dependence. Last, 
Smith and Khaled (2011) suggest efficient Bayesian data augmentation methodology for the 
estimation of copula models for multivariate discrete data, or a combination of discrete and 
continuous data. Their approach is for general copula functions, not just Gaussian copulas, 
or copulas constructed by inversion. 

This article is divided into three main sections. The first provides an introduction to 
copula modelling. There are a number excellent in-depth introductions to copulas and their 
properties; for example, see Joe (1997) and Nelsen (2006). The purpose of this section is 
not to replicate any of these, but to introduce aspects that are important in Bayesian copula 
modelling. This includes an outline of what makes copula models so useful, how copulas 
models can be viewed as transformations, what are copulas constructed by inversion and 
vine copulas, and why the D-vine copula is a natural model of serial dependence. 

In the next two sections Bayesian approaches to formulating and estimating copula mod- 
els are discussed separately for multivariate continuous and discrete data. This is because 
copula models, and associated methods, differ substantially in these two cases. In Section [3] 
the advantages of using Bayesian inference over maximum likelihood for case of continuous 
data are discussed. For the Gaussian copula, a sampling scheme that can be used to evaluate 
the joint posterior distribution of the copula and any marginal model parameters is outlined 
in detail. Different priors for the correlation matrix of the Gaussian copula are considered, 
including priors based on a Cholseky factorisation, the partial correlations as in Pitt, Chan 
and Kohn (2006), and the conditional correlations discussed in Joe (2005) and Daniels and 
Pourahmadi (2009). A new Bayesian selection approach using the latter is outlined, where 
the fitted copula model is a Bayesian model average over parsimonious representations of the 
dependence structure. Bayesian estimation and selection for D-vine copulas is also outlined. 
An interesting insight is that Bayesian selection of individual pair-copulas nests Bayesian se- 
lection of the conditional correlations for a Gaussian copula. Bayesian estimates of popular 
dependence metrics from the fitted copula are also discussed, where parameter uncertainty 
can be integrated out using the Monte Carlo iterates from the sampling scheme. 

Denuit and Lambert (2005) and Genest and Neslehova (2007) point out that popular 
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method of moments style estimators based on ranks should not be used to estimate copula 
models for discrete data, making likelihood-based inference more important. However, the 
likelihood function differs substantially from that in the continuous case, and computational 
issues mean that maximum likelihood estimation is more difficult than in the continuous 
case. An effective solution is to employ Bayesian data augmentation, as outlined for a 
Gaussian copula in Section HI The priors for the correlation matrix of the Gaussian copula, 
and also the Bayesian selection framework, are unaffected by whether the data is discrete 
or continuous. Last, it is discussed how measuring dependence in discrete data differs from 
that in the continuous case. 

2 What Are Copula Models? 
2.1 The basic idea 

Consider initially the bivariate case with two random variables, Yi and Y2, with marginal 
distribution functions -Fi(yi) and -^2(2/2), respectively. A copula model is a way of construct- 
ing the joint distribution of (Yi, Y2). Sklar (1959) shows that there always exists a bivariate 
function C : [0, 1]^ — t- [0, 1], such that 

F(?/i,l/2) = C(Fi(yi),F2(y2)). 

The function C is itself a distribution function with uniform margins on [0, 1], and is labelled 
the 'copula function'. It binds together the univariate margins Fi and F2 to produce bivariate 
distribution F. 

If both margins Fi and F2 are continuous distribution functions, then there is a unique 
copula function C for any given joint distribution function F. If either Fi or F2 are discrete- 
valued, then C is not unique. However, the objective of copula modelling is not to find the 
copula function(s) C that satisfy Sklar's representation, given knowledge of Fi,F2 and F. 
Instead, the objective is to construct a joint distribution F from a copula function C and 
marginal models for Fi and F2. In this way, copula models can be used equally for discrete 
or continuous data, or a combination of both. 

It is important to notice that the copula function C does not determine the marginal 
distributions of F, but accounts for dependence between Yi and Y2. For example, in the 
case where Yi and I2 are independent, the copula function is C{ui,U2) = U1U2, so that 
F{yi,y2) = Fi{yi)F2{y2)- This copula function is called the 'independence copula'. 

The copula model is easily generalised to m dimensions as follows. Let Y = (Fi, . . . , Ym) G 
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Sy be a random vector with elements that have marginal distribution functions Fi, . . . , Fm, 
then the joint distribution function of Y is 

F{y,, ...,y^) = C{F,{y,), F^{y^)) . (2.1) 

Again, the copula function C : [0, 1]"^ — > [0, 1] is itself a distribution function for random 
vector U = [Ui, . . . , Um)' with uniform margins on [0, 1]. As before, if all elements of Y are 
continuous random variables, then there is a unique copula function C for any given F, but 
this is not the case if one or more elements are discrete-valued. Nevertheless, Equation (12.11) 
can still be used to construct a well-defined joint distribution F, given Fi, . . . , F^ and C, 
just as in the bivariate case. 

2.2 Why are copula models so useful? 

A key feature of the copula representation of a joint distribution is that it allows for the mar- 
gins to be modelled separately from the dependence structure. This promotes a 'bottom-up' 
modelling strategy, where models are first developed one-by-one for each univariate margin. 
Dependence is then introduced by an appropriate copula function C. Sklar's theorem re- 
assures that this is not an ad-hoc approach, and that there should be at least one copula 
function C that correctly constructs the joint distribution F, as long as the marginal models 
Fi, . . . , Fm are accurate. Compare this to a more restrictive 'top-down' alternative, where 
the joint distribution function F is selected first, which then determines the form of the 
marginals. For example, if F is a multivariate t distribution with v degrees of freedom, then 
each Fj is restricted to be univariate t with a common degrees of freedom v. 

For much applied multivariate modelling, the flexibility that the bottom-up approach 
allows is compelling. The marginal models can be of the same form, or completely different, 
including any of the following: 

(i) Parametric Distributions: A parametric distribution Fj{yj]6j), with parameters 6j. 
For example, Fj may be a t distribution with location fij, scale aj > and degrees of 
freedom Uj > 0, so that 9j = {fij, aj, Uj}. A copula model with t distributions for each 
margin is more flexible than a multivariate t distribution because the level of kurtosis 
can differ in each dimension (Fang, Fang and Kotz 2002). For discrete data, Fj may be 
a negative binomial distribution with stopping parameter rj > and success parameter 
Pj G (0, 1), so that 9j = {rj,pj}. The negative binomial is a very popular model for 
count data that exhibit heterogeneity, and copula models provide flexible multivariate 
extensions (Lee 1999; Nikoloulopoulos and Karlis 2010; Danaher and Smith 2011). 
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(ii) Nonparametric Distributions: Approaches where each margin is modelled nonparamet- 
rically using the empirical distribution function (or a smoothed variant) have long been 
advocated in the copula literature; for example, see Genest, Ghoudi and Rivest (1995), 
Shih and Louis (1995) and Chen, Fan and Tsyrennikov (2006). Similarly, Fj can be 
modelled using Bayesian nonparametric methods; see Hjort et al. (2010) for recent 
accounts of these. Alternatively, rank likelihoods can be used for each marginal model 
as outhned by Hoff (2007). In all cases, copula models provide simple multivariate 
extensions of existing nonparametric methods. 

(iii) Regression Models: Univariate regression models can be used for each margin, in which 
case the resulting copula model is called a 'copula regression model' (Oakcs & Ritz 2000; 
Song 2000). The regression coefficients (3j can be pooled across margins j = 1, . . . ,m, 
so that /?! = = • • • = /3m) in which case the copula model is then an extension of 
the multivariate regression model. If the regression coefficients differ for each margin, 
then the copula model extends the 'seemingly unrelated regression' model popular in 
econometric analysis (Zellner 1962). 

(iv) Time Series Models: When observations are made on a multivariate vector over time, 
the marginal models can be parametric time series models, and contemporaneous de- 
pendence captured via the copula function (Patton 2006; Chen and Fan 2006; Ausin 
and Lopes 2010). Popular choices are GARCH or stochastic volatility models for the 
margins. As with copula regression models, marginal parameters can either be pooled 
or allowed to vary across margin. 

2.3 Copula functions and densities 

Nelsen (2006, p.45) hsts the three conditions that C needs to meet to be an admissible copula 
function, which are: 

(i) For every u = {ui, . . . , Um) G [0, 1]™, C(ti) = if at least one element Ui = 0. 

(ii) If all elements of u are equal to one, except it,, then C{u) — Ui. 

(iii) For each a — {ai, . . . ,am),b — {hi, . . . ,bm) € [0,1]"*, such that < hi for all i — 
l,...,m, 

A^A^-i ■■■A'']C(v) > 0. 
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Here, A^* is a differencing notation defined as 



A^^^C(mi, . . . , Uk-U Vk, Uk+l, . . . , Um) 

C{ui, . . . ,Uk-i,bk,Uk+i, 



Um) - C{Ui,.. . , Uk-l, ttk, Uk+l, . . . , 



with ffc a variable of differencing, and w = {vi, . . . ,Vm)- Notice that if c(m) = d"^C{u)/dui . . .du. 
exists, then property (iii) is equivalent to 



Properties (i) and (iii) are satisfied if C{u) is a distribution function on [0, l]*", while prop- 
erty (ii) is satisfied if C also has uniform margins. The density function c{u) is commonly 
referred to as the 'copula density'. 

In the vast majority of cases parametric copula functions C(m;0), with parameters 0, 
are used in applied analysis. There are a large number of choices for C, with Joe (1997) 
and Nelsen (2006) providing overviews of a wide range of copula functions and their prop- 
erties. Particularly popular in the bivariate case are the family of Archimedean copulas; see 
Nelsen (2006; Chap. 4). Three of the most popular Archimedean copulas are the Frank, 
Clayton and Gumbel. These are listed in Table [Tj along with their densities and measures 
of dependence. 

2.4 Constructing copulas by inversion (of Sklar's theorem) 

Beyond the bivariate case, copulas that are constructed through inversion of Sklar's theorem 
are popular; see Nelsen (2006, Sect. 3.1). To derive a copula function in this way, let 
X = (Xi, . . . ,Xm) G Sx have distribution function G{x; 0), with parameters and strictly 
monotonic univariate marginal distribution functions Gi{xi] 0), . . . , Gm{xm', 0)- By Sklar's 
theorem, there always exists a copula function C, such that 



Denoting Uj = Gj{xj; 0), then Xj = Gj '^{uj; 0), and substituting this into the equation above 
defines a copula function: 




G{x; 0) = G{Gi{xi; 0), . . . , Gmixm] 0)) • 



Um] 0) = G{G^ (Mi; 0), . . . , G„ {Um] 0); 0) • 



(2.2) 
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Frank (0 € (-00, 0) U (0, 00)) 

C{uu u,- <!>) = -1 log (1 + (g^SP( ^-i)(gP^^^ 

c(mi,M2; 0) = (exp(0(l + Ml + M2))(exp(0) - 1)) 

X [exp(0) - exp(0(l + Ml)) - exp(0(l + U2)) + exp(0(ui + U2))] 

ri,2(0) = 1 + |Pi(0) - 1), Af^,(0) = A^,(0) = 

Clayton (0 £ (-l,oo)\{0}) 

C{ui, U2; 0) = max + u^^ - l)"!/"^, o| 

c(ui,M2;0) = max |(1 + 0)(Min2)""^"'^ (^U]^''^ + ^2''^ - ,o| 

ri.2(0) = 0/(0 + 2), Af^2(0) = 2-^/'^ and A^2(0) = 

Gumbel (0 > 1) 



C(ui, M2; 0) = exp(— (wf* + u'^Y^'^) , where = — log(-Uj) 

c(Mi, M2; 0) = C(Mi, M2; 0) (Ml U2)-\ut + )-2+2/<^(MiM2)*- 



X 


l + (0-l)(sf + «^)"'^^ 




T"l,2(0) = 1 - ( 


r\ Af,2(0) = and A^2(0) 


= 2 - 2^/*^ 



Table 1: Copula functions, density functions and measures of dependence for the Frank, 
Clayton and Gumbel copulas. For the Frank copula, the function -Di(0) = ^ J^j' t / {exp{t) — 
l)dt is the Debye function; see Abramowitz and Stegun (1965; p. 998). 

It is important to notice that the multivariate distribution G is only used to construct the 
copula function C, and is not the distribution function of the random vector Y, which 
remains F as given in Equation (12. ip . The parameters of the distribution of X are the 
parameters for copula function C. 

Elliptical distributions are common choices for G (Fang, Fang and Kotz 2002), and the 
resulting copula functions are collectively called 'elliptical copulas'. The Gaussian copula 
(Song 2000) is the most popular of these, where G is the distribution function of a multi- 
variate normal with zero mean, correlation matrix F and unit variances in each dimension. 
In this case, = F, G{x; 0) = ^rn{x; F) and Gj{xj; 0) = $i(xj, 1), with $fc('; V) the distri- 
bution function of a /c-dimensional A^(0, V) distribution. The Gaussian copula function is 
therefore 

Giuu . . . , u^; 0) = ^mi^^\uu 1), . . . , ^i\um; 1); F) . (2.3) 

The restrictions on the first and second moments of X are necessary to identify the copula 
parameters F in the likelihood. 

When each marginal distribution Fj is univariate normal with mean /i^ and variance (t|, 
then Uj = $i(?/j — fJ'j] CTj). If a Gaussian copula is also assumed, then the copula model for Y 
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simplifies to a multivariate normal distribution with mean /x = (/xi, . . . and covariance 
matrix DTD, with D = diag(o"i, . . . , am)- 

Other choices for G include a multivariate t distribution, which results in the t cop- 
ula (Demarta and McNeil 2005), or a multivariate skew t distribution (Smith, Gan and 
Kohn 2010b). When selecting G, care has to be taken to consider any restrictions on that 
may be necessary to identify the parameters in the likelihood. 

2.5 Copula models as transformations 

Copula modelling can be interpreted as a transformation from the domain of the data, to 
another domain where the dependence is easier to model. The transformation is depicted 
in Figure [TJ If the elements of Y are continuous-valued, the transformation Yj i— )■ Uj is 
one-to-one, as is the transformation Yj Xj for inversion copulas. 
The density of Y is given by 

fiy) = —G{F,{y,), . . . , Fm{ym)) = c(w) J] > (2-4) 

with u = ...,Um),Uj= Fj{yj), fj{yj) = ^Fj{yj) and c{u) = £C(m). 

However, when the data are discrete-valued, the probability mass function is obtained 
by differencing the distribution function in Equation (12.11) . so that 

pr(F = y) = KzKl-\ ■ ■ ■ ^l\C{v) , (2.5) 

where v = {yi, . . . ,Vm) are indices of differencing. The upper bound bj = Fj{yj) and lower 
bound ttj = Fj{y~) is the left-hand limit of Fj at yj, with Fjiyj) = Fj{yj — 1) when Yj is 
ordinal-valued. In this case the transformations Yj i-t- Uj and Yj h-> Xj are both one-to-many. 
This means that the elements Uj\Yj = yj and Xj\Yj = yj are only known up to bounds, with 

FjiyJ) < < Fj{yj) and, 

G]\F,{yj))< X, <GTi(F,(y,)), 

for j = 1, . . . ,m. Nevertheless, Y, U and X still have distribution functions F, G and G, 
respectively. 

It is outlined later in Section HI how interpreting a copula model as a transformation 
allows for the construction of Bayesian data augmentation schemes to evaluate the posterior 
distribution when one or more margins are discrete. 
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Variable Y — > U — > X 

Domain Sy — > [0,1]"^ — > Sx 

Joint CDF F{y) C{u) G{x) 

Marginal CDFs Fj{yj) — > Uniform — > Gji^j) 

Figure 1: Depiction of the transformation underlying a copula model. The right hand column 

for variable X is for copulas constructed by inversion only. The transformations are given 
in the top row for Yj continuous- valued. 



2.6 Vine copulas 

Much recent research in the copula literature has focused on building copulas in m > 2 
dimensions. One popular family of copulas are called 'vines', which are constructed from 
sequences of bivariate copulas. Joe (1996; 1997) was an early advocate of this approach, 
while Bedford and Cooke (2002) organise the different decompositions in a systematic way. 
Aas et al. (2009) called the bivariate copulas 'pair-copulas', and vines are also known as pair- 
copula constructions (PCCs). Recent overviews are given by Haff, Aas and Frigessi (2010) 
and Czado (2010). 

Smith et al. (2010) point out that if the elements of Y are ordered in time, so that Yt is 
observed before Vt+i, a vine labelled 'decomposable' by Bedford and Cooke (2002) (or D-vine 
for short) proves a natural way of characterising serial dependence; particularly Markovian 
serial dependence. This can be motivated by considering the following decomposition of the 
density of U, 

m 

^Ylf{ut\ut-i,...,ui) , 

t=2 

where f{ui) = 1 because the marginal distribution of Ui is uniform on [0, 1]. The idea is 
to build a representation for each conditional distribution f[ut\ut-i, . . . ,Ui) as follows. For 
s <t there always exists a density Ct^s on [0, 1]^ such that 

f{ut, Us\Ut-i, . . . , Us+l) = f{Ut\Ut_i, Us+l)f{Us\Ut-i, . . . , Us+l) 

X ct,s {F{ut\ut-u • • • , Us+l), F{us\ut-i, . . . , Us+i);ut-i, . . . , m^+i) (2.6) 

Here, F{ut\ut-i, . . . , Ug+i) and F{us\ut-i, . . . , Ug+i) are conditional distribution functions of 
Ut and Us, respectively. This is the theorem of Sklar applied conditional on {Ut-i, . . . , Ug+i}. 
In a vine copula, Ct^s is the density of a bivariate 'pair-copula' and it is simplified by dropping 
dependence on {ut-i, . . . , Us+i); see Haff, Aas and Frigessi (2010) for a discussion of why this 
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is often a good approximation. By setting s = 1, application of Equation (12. 6 p gives 

f{Ut\Ut-l, . . . , Ml) = Ct,l{F{ut\ut-l, . . . , U2), F{ui\ut-l, . . . , U2))f{Ut\ut-l, . . . , M2). 

Denoting Ut\j = F(ut\ut-i, . . . ,Uj) and Uj\t = F{uj\ut, . . . ,Uj+i), for j < t, repeated 
application of the above with s = 2, 3, . . . , t — 1 leads to the following: 

t-i 

fiUt\Ut_l, ...,Ui) = Y\_ Ct,siUt\s+l, Us\t-l) , 
s=l 

where the notation Ut\t = Ut, for t = 1, . . . ,m. Therefore, the D-vine copula is given by 

m f t-1 "j 

c(m) = n 1 Yl(^tAut\s+i,Us\t-i) > , (2.7) 

t=2 ls=l J 

which is a product of m(m — l)/2 pair-copula densities, and u = {ui\i, . . . ,Um\m)- If each 
pair-copula Ct^s has copula parameter 0^ then the parameter vector of the D-vine is = 
{'Pt,s', t = 2, . . . ,m, s < t}. The hardest aspect of using the copula in Equation (12.71) is the 
evaluation of the arguments of the component pair-copulas. Aas et al. (2009), give an O(m^) 
recursive algorithm for the evaluation of these from u, based on the identity in Joe (1996, 
p. 125); see also Algorithm 1 in Smith et al. (2010) E 

Algorithm: (Evaluation of the Arguments of a D-vine) 

For = 1, . . . , m — 1 and i = k + 1, . . . ,m: 

Step 1: Compute Mj|j_fc = hi^i^k{ui\i~k+i\ui-k\i-i; 4>i,i~k) 
Step 2: Compute Ui_k\i = hi^i-k{ui-k\i-i\ui\i-k+i; 4>u~k) ■ 

The functions ht^s{ui\u2', 4>t,s) = Jq^ Ct,s{v,U2; 4>t,s)dv are the conditional distribution func- 
tions for the pair-copula with density Ct^s] see Aas et al. (2009) and Smith et al. (2010) for 
lists of these for some common bivariate copulas. 

Because any combination of bivariate copula functions can be employed for the pair- 
copulas, the D-vine copula can be extremely flexible. Moreover, other vine copulas can be 
constructed using alternative sequences of pair-copulas; see Bedford and Cooke (2002) and 
Aas et al. (2009). However, the D-vine at Equation (12.71) is uniquely well-motivated when 
the elements of U are time-ordered. 

^Smith et al. (2010) denote = F{yt\yt-i, ■ ■ ■ ,yj) and Uj^ = F{yj\yt, . . . ,yj+i) for Yi,...,Yrn con- 
tinuous random variables. However, this can be shown to be equivalent to the definition of u^j and Uj\t 
employed here. 

^The algorithm here corrects a minor subscript typographical error in the algorithm in Smith et al. (2010). 
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2.7 Measures of dependence 

Nelsen (2006; Chap. 5) and Joe (1997; Chap. 2) discuss measures of dependence for copula 
models. In general, this is characterised by marginal pairwise dependencies between elements 
Yi and Yj. Kendall's tau and Spearman's rho are the two most popular measures of pairwise 
concordance, and empirical analysts are often familiar with sample versions based on ranked 
data. However, when Yi and Yj are continuous-valued, and Y follows the copula model at 
Equation (12. ip . the population equivalents can be expressed as 

Ti,, = 4 (^^' C5(m„ M,)dCg(M„ uj)^ - 1 = 4E{C^^{U„ Uj)) - 1 , and 

= 12 [ [ u,UjdC^j{ui,Uj)-3 = 12E{U,U,)-3. (2.8) 
Jo Jo 

In the above expressions, C^j is the distribution function of (f/j, Uj) and is a bivariate margin 
of the m-dimensional copula function C. For some copulas Cj^j can be computed in closed 
form, but for others this is not possible. Similarly, the expectations in the expressions for 
Tij and pf j can sometimes be computed in closed form, but for other choices of copulas they 
are computable only numerically, or by Monte Carlo simulation. Within a Bayesian MCMC 
framework the latter often proves straightforward; see Section 13.51 

In many situations high values of Yi and Yj exhibit different levels (or even directions) 
of dependence than low values of Yi and Yj; something that is called 'asymmetric (pairwise) 
dependence'. As noted by Nelsen (2006, Chap. 4), when Yi and Yj are continuous- valued, 
then the dependence properties of the bivariate margin in these two variables is characterized 
by the dependence properties between f/j and Uj. In this case, measures of asymmetric 
dependence are often based on the conditional probabilities 

Kji^) — P^(f^j > a\Uj > a) 
A[j(a) = piiUi < a\Uj < a) , 

where < a < 1. The limits of these are called the upper and lower tail dependencies 
(Joe 1997, p. 33), and denoted as 

= li^?A3(«) ' A^J = limAf;(a) . 

For bivariate copula models there is only a single pairwise combination, Yi and Y2, and 
for many bivariate copula functions dependence measures are available in closed form. For 
example. Table [1] gives expressions for measures of dependence for the Frank, Gumbel and 
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Clayton copulas; see Joe (1997), Nelsen (2006) and Huard, Evin and Favre (2006) for others. 
Pairwise dependence measures in multivariate m-dimensional elliptical copulas can also have 
closed form expressions. In particular, the Gaussian copula has zero tail dependence, with 
— '^^i^ — 0) whereas, the t copula has tail dependence that is non-zero, but is symmetric 
with A"^ = A^j". When employing a copula model it is important to ensure that the copula 
has dependence properties that are consistent with those exhibited by the data. 

3 Bayesian Inference for Continuous Margins 

When the data are continuous, the likelihood of n independent observations y = {yi, . . . , 
each distributed as Equation ([2?T]), is f{y\Q, 0) = n"=i fiVil^^ <P)^ where yi = {yn, yi^)' 
and 

m 

f{y,\e,<P) = c{uf,<P)l[f,{y,f,e,). (3.1) 

i=i 

Here, Ui = {un, . . . jUim)' , Uij = Fj{yij;6j), = {9i, ... ,9m} are any parameters of the 
marginal models, and fj{yij]Oj) = -^FjiyifOj) is the marginal density of yij. Initially, 
Equation (13. ip appears separable in 9i,...,9m and 0, but this is not the case because Ui 
depends on 0. Most parametric copula functions have analytical expressions for the densities 
c{u; (f)), so that maximum likelihood estimation is often straightforward. However, there are 
a number of circumstances where a Bayesian analysis can be preferable: 

(i) For more complex marginal models Fj{yij]9j) and/or copula functions C{u](l)), the 
likelihood can be hard to maximise directly. One solution is to use a two stage estima- 
tor, where the marginal model parameters 9j are estimated first, and then estimated 
conditional on these. In the copula literature, this is called 'inference for margins'; see 
Joe (2005) and references therein for a discussion. Another solution is to use to an 
iterative scoring algorithm to maximise the likelihood, as suggested by Song, Fan and 
Kalbfleisch (2006). However, an attractive Bayesian alternative in this circumstance 
is to construct inference from the joint posterior /(0, (j)\y) evaluated in a Monte Carlo 
manner, with and generated separately in a Gibbs style sampling scheme; see 
Pitt, Chan and Kohn (2006), Silva and Lopes (2008) and Ausin and Lopes (2010) for 
discussions. 

(ii) Bayesian hierarchical modelling has proven very successful for the modelling of mul- 
tivariate data. This includes parsimonious modelling of covariance structures using 
Bayesian selection and model averaging; see Giudici and Green (1999), Smith and 
Kohn (2002), Wong, Carter and Kohn (2003) and Friihwirth-Schnatter & Tiichler (2008) 
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for examples. Bayesian selection can be extended to nonlinear dependence by consider- 
ing priors with point mass components for (j). For example, Pitt, Chan and Kohn (2006) 
use a 'spike and slab' prior similar to Wong, Carter and Kohn (2003) for the off-diagonal 
elements of the concentration matrix F^^ of a Gaussian copula. Smith et al. (2010) 
use Bayesian selection ideas to mix over independent and dependent pair-copulas in a 
vine copula. Hierarchical models can also be employed for the margins Fj{yj; 6j), and 
estimated jointly with the dependence structure captured by the copula function. 

(iii) When estimating a copula model, the objective is often to construct inference on 
measures of dependence, quantiles and/or functionals of the random variable vector 
Y or parameters (0,0). Evaluation of the posterior distribution of these quantities is 
often straightforward using MCMC methods. 

3.1 The Gaussian copula model 

To illustrate, Bayesian estimation of a Gaussian copula model for continuous margins is 
outlined as suggested by Pitt, Chan and Kohn (2006). Following Song (2000) and others, 
derivation of the copula density is straightforward by differentiation of Equation (12. 3p . so 
that 

c(n;0) = |-C(«;0) = |F|-^/2exp|-ix'(F-i-/)x| , (3.2) 

where x = ($^^(^1; 1), . . . , ^i^{um', 1))'- Thus, the likelihood at Equation (13. ip is a function 
of and F, and can be written as 

f{y\e, F) = |F|-"/2 exp {-^^^(r-^ - /)x,| H fM^^ j , (3-3) 

where Xi = {xn, . . . yXim)', Xij = ^^^{uij; 1) and Uij = Fjiyif, 6j). Bayesian estimation can 
be undertaken using the following MCMC sampling scheme: 

Sampling Scheme: (Estimation of a Gaussian Copula) 

Step 1: Generate from f{9j\{Q\9j}, F, y) for j = 1, . . . , m. 
Step 2: Generate from f(T\Q,y). 

Here, {A\B} is notation for A with component B omitted. Steps 1 and 2 are repeated 
(in sequence) a large number of times, with each repeat usually called a 'sweep' in the 
Bayesian literature. The scheme requires an initial (feasible) state for the parameter values, 
which is denoted here as (B'^],^'^]). The iterates from the scheme form a Markov chain. 
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which can be shown to converge to the joint posterior distribution f{Q,(f)\y), which is the 
(unique) invariant distribution of the chain. After an initial number of sweeps, the chain 
is assumed to have converged and subsequent iterates form a Monte Carlo sample from 
which the parameters are estimated, and other Bayesian inference obtained as outlined in 
Section 1331 For introductions to MCMC methods for computing Bayesian posterior inference 
see Tanner (1996) and Robert and Casella (2006). 
The posterior in Step 1 is given by 



where vr(6'j) is the marginal prior for 6j. In general, the density is unrecognisable because Xij 
is a function of 9j, so Pitt, Chan and Kohn (2006) suggest using a Metropolis-Hastings (MH) 
step with a multivariate t distribution as a proposal to generate 6j in Step 1. The mean of the 



is calculated numerically using finite difference methods. The scale matrix of the MH pro- 
posal is —H~^, and a low degrees of freedom, such as = 5 or = 7, is employed so that the 
proposal dominates the target density in the tails. If 6j has too many elements for H to be 
evaluated in a numerically stable and computationally feasible fashion, 6j can be partitioned 
and generated separately. Alternative MH steps are also possible, including those based on 
the widely employed random walk proposals. 

The approach used to generate F in Step 2 varies depending on the prior and matrix pa- 
rameterisation adopted, of which there are several alternatives. Pitt, Chan and Kohn (2006) 
consider a prior on the off-diagonal elements of F~^, which is equivalent to assuming a prior 
for the partial correlations Corr(Xt, X^lXj^j^ ^j) for t = 2, . . . , m; s < t. Hoff (2007) suggests 
using a prior for F in a Gaussian copula that results from an inverse Wishart prior for a 
covariance matrix. However, because F is just a correlation matrix (for X), any prior for a 
correlation matrix can also be used; for example, see those suggested by Barnard, McCuUoch 
and Meng (2000), Liechty, Liechty and Miiller (2004), Armstrong et al. (2009), Daniels and 
Pourahmadi (2009) and references therein. 



/(^,|{e\^,},F,i/)oc/(y|e,F)7r(^,) 



(X 




(3.4) 
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3.1.1 Prior based on a Choelsky factor: 

One such prior for a correlation matrix is based on a Cholesky factorisation, which is par- 
ticularly suited to longitudinal data. This prior uses the decomposition 

r = diag(S)-^/2S diag(S)-^/2 , (3.5) 

where S is a non-unique positive definite matrix, and diag(S) is a diagonal matrix comprised 
of the leading diagonal of S. The matrix = R'R, with R = {vkj} being an upper 
triangular Cholesky factor, and to ensure that the parameterisation is unique, r^fc = 1, 
for k = l,...,m. Generation of F in Step 2 is undertaken by generating the elements 
{f^kj'jj = 2, . . . , m, k < j} one at a time from the conditional posterior 

f{nJ{R\r,,},Q,y) oc irr"/^ (j[exp {-^^^r-^ - /)a;.| j , 

using random walk MH; see Tanner (1996; p. 177) for a discussion of this simulation tool. 
Once an iterate of R is obtained, the iterate of F can be computed using the relationship at 
Equation (13. 5p . Using a different prior, Hoff (2007) uses a similar approach to generate a 
correlation matrix for a Gaussian copula. 



3.1.2 Prior based on partial correlations: 

Daniels and Pourahmadi (2009) suggest parameterising a correlation matrix using the partial 
correlations 

Xt,s = Corr(Xt, X,|Xt„i, . . . , X,+i) , for s < t . (3.6) 

This prior is based on the work of Joe (2006), who notes that these are unconstrained 
on (—1,1), and that A = {Xt,s]t = 2, . . . ,m, s < t} provides a unique parameterisation 
of r. Note that Xt^s is sometimes called a 'semi-partial' correlation because it is not the 
correlation conditional on all other variables CoTT{Xt, Xs\Xj^^t,s}), which is the 'full' partial 
correlation considered by Pitt, Chan and Kohn (2006). One advantage is that the conditional 
distribution of Af_s|{A\Af,s} is only bounded to (—1, 1), whereas the conditional distribution 
of the full partial correlations have more complex bounds. Daniels and Pourahmadi (2009) 
suggest using either Beta or uniform priors for Xt^s, which can be employed and Step 2 
undertaken by generating the elements of A one at a time, again using MH with a random 
walk proposal. Once an iterate of A is obtained, F can be computed using the identity at 
equation (2) of Daniels and Pourahmadi (2009). 
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There is an interesting link between the Gaussian copula parameterised by the partial 
correlations A, and the D-vine copula in Equation (12. 7p . When the pair-copulas in the D-vine 
are bivariate Gaussian copulas, with densities 



Ct,s{ui,U2; M = I exp <^ — \ , (3.7) 

1^ 2(1 J 

where xi = ^^^{ui] 1) and X2 = ^i^{u2] 1), then the D-vine copula can be shown to be a 
Gaussian copula with copula density at Equation (13. 2p : see Aas et al. (2009) and Half, Aas 
and Frigessi (2010). In this case, the individual pair-copula parameters (f)t,s above are the 
partial correlations Xt^s- 

3.2 Bayesian selection in a Gaussian copula 

Bayesian selection approaches can be employed to allow for parsimonious modelling of F in a 
Gaussian copula. It is well known that Bayesian selection can significantly improve estimates 
of a covariance matrix compared to maximum likelihood; see Yang and Berger (1994), Giudici 
and Green (1998), Smith and Kohn (2002), Wong, Carter and Kohn (2003), Friihwirth- 
Schnatter & Tiichler (2008) and others for extensive evidence to this effect. Pitt, Chan 
and Kohn (2006) show that this is also the case when estimating the dependence structure 
of Y using a Gaussian copula model. They consider a selection prior with point mass 
probabilities on the off-diagonal elements of F~^. In the Gaussian copula this is equivalent 
to identifying for which pairs (t,s) the full partial correlation Corr(Xt, Xs|Xj^js^i}) = 0. 
This also corresponds to conditional independence between Yt and Yg, with the conditional 

density f{yt,ys\yj({s,t}) = f{yt\yji{s,t})f{ys\yj<^{s,t})- 

3.2.1 Priors for selection: 

Bayesian selection can also be undertaken for the semi-partial correlations A defined in 
Equation (13. 6p . In the Gaussian copula this is equivalent to determining for which pairs 
(t, s) there is conditional independence between elements of Y, with conditional density 

f{yt, ys\yt-i, • • • , ys+i) = f{yt\yt-i, • • • , ys+i)f{ys\yt~i, • • • , ys+i) , 

when Xt^s = 0. To introduce a point mass probability for this value, binary indicator variables 
7 = {7<,s; t = 2, . . . ,m, s < t} are introduced, such that 

Xt,s = iff 7^,, = . 
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The non-zero partial correlations Xt,s\^t,s — 1 are independently distributed with proper prior 
densities 7r{Xt,s)- Joe (2006) highlights that At,s|{A\At,5} are unconstrained on (—1,1), so 
that either independent uniform or Beta priors are simple choices for Tr{Xt,s)', see Daniels and 
Pourahmadi (2009). In comparison, each full partial correlation has bounds that are complex 
functions of the other full partial correlations and computationally demanding to evaluate. 
For this reason, Bayesian selection using the partial correlations A is computationally less 
burdensome than using the full partial correlations. 

The prior on the indicators 7 can be highly informative when the number of indicators 
N — m{m — 1) /2 is large. For example, if — ^ ^t,s is the number of non-zero elements 
in A, then assuming flat marginal priors T^{'^t,s) = 1/2 puts high prior weight on values for 

?s A^/2. This problem has been noted widely in the variable selection literature; see Kohn, 
Smith and Chan (2001), Zhang, Dai and Jordan (2011) and Bottolo and Richardson (2010). 
One solution is to employ the conditional prior 



where B[-, •) is the beta function. This prior has been used effectively in the Bayesian selec- 
tion literature, with early uses in Smith (2000) and Smith and Kohn (2002). It corresponds 
to assuming the joint mass function 



The implied prior for the total number of non-zero elements of A is uniform, with 7r(w^) = 
-\- N), while the marginal priors 'K{'^t,s) are all equal; see Scott and Berger (2010) for 
a discussion. This prior is also equivalent to the uniform volume-based prior suggested by 
Wong, Carter and Kohn (2003) and Cripps, Carter and Kohn (2005) on the model space. 

3.2.2 MCMC sampling scheme: 

To evaluate the joint posterior distribution of the indicator variables and the partial corre- 
lations A, latent variables for t = 2, . . . , m, s < are introduced such that A^^^ = A^^^ 
if lt,s — 1- Notice that \t^s is known exactly given the pair {Xt,s-ilt,s)-i so it is suffi- 
cient to implement a sampling scheme to evaluate the joint posterior /(A, 7, ©|y), where 
A = {Xt,s] i = 2, . . . , m, s < i}, as below. 

Sampling Scheme: (Bayesian Selection for a Gaussian Copula) 



T^{lt,s = l|{7\7t, J) ^ B{N -w^ + l,w^ + l) , 



(3.8) 
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Step 1 
Step 2 
Step 3 



Generate from f{0j\{Q\9j}, T, y) for j = 1, . . . , m. 

Generate from f(\t,s, 7i,s|0, {A\At, J, {l\lt,s}, y) for t = 2, . . . ,m, s < t. 
Compute A from (A, 7), and then F from A. 



Step 1 is unchanged from that in Section 13.11 while Step 2 consists of MH steps to 
generate each pair {Xt,s,lt,s), conditional on the others. The MH proposal density is 

qiK,snt,s) = qiiit,s)q2iK,s) ■ 

To generate from the proposal q above, an indicator is generated from qi{'~ft,s = 0) = qi{'^t,s = 
1) = 1/2, and Xt^s from a symmetric random walk proposal q2 constrained to (—1, 1). For 
example, one such symmetric proposal for q2 is to generate a new value of ^ from a normal 
distribution with mean equal to the old value, standard deviation 0.01, and constrained to 

(-1,1)- 

Temporarily dropping the subscripts (t, s) for convenience, a new iterate (^_\"eio^ ^new-j 
generated from the proposal q is accepted over the old value (A°''^, 7°''^) with probability 

/ 7r(A"^"') \ , , 

min l,a-^ — -k , 3.9 
\^ 7r(A°''^) J ^ ' 

where k is an adjustment due to the bounds (—1, 1) on A. If the symmetric density q2{-) has 
distribution function Q2{-), then 

Q2(l-A"''')-Q2(-1-A°'^) 



Q2(l_A--)-Q2(-l-A- 



If a uniform prior is adopted for Xt^s, as suggested in Daniels and Pourahmadi (2009), then 
the ratio 7r(A"'^"')/7r(A°''^) = 1 in Equation (13. 9p . At each generation in Step 2, the likelihood 
in Equation (13. 3 p is a function of (A, 7), so it can be written here as L(A,7). Using this 
notation, the value a in Equation (13. 9p can be expressed separately for the four possible 
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configurations of (7°''^, 7"'^'^) as: 

a (^(A°'^7°''^ = 1) -> (A"'^"',7'^'="' = 0) 
a ({X°^\-f°^'^ = 1) -> (A"'="',7'^'="' = 1) 



1, 






= l)<5l 


L(0,7°''^ = 


0)5o 


^(0,7"^^^ = 


0)5o 






T ( \ new ^.new 
L{A ,7 


= 1) 




= 1) ' 



wfiere 5o and 5i are tlie conditional probabilities from Equation fl3.8p that 7^3 = and 
1, respectively. Notice that when (7°'"^ = 0) — )■ (7"*^'^ = 0) the likelihood does not need 
computing to evaluate the acceptance ratio at Equation (13. 9p . This case will occur frequently 
whenever there is a high degree of sparsity in the dependence structure, so that each sweep 
of Step 2 will be much faster than if no selection was considered. 

Reintroducing subscripts, Step 3 of the sampling scheme is straightforward, with each 
partial correlation 

if 7m = 

\s if lt,s = 1 

and the correlation matrix F can be obtained directly from A using the relationship in 
Joe (2006) and Daniels and Pourahmadi (2009). 



A 



t,s 



3.3 Bayesian estimation and selection for a D-vine 

Bayesian estimation for vine copulas is discussed in Min and Czado (2010; 2011) and Smith 
et al. (2010). The latter authors consider Bayesian selection and model averaging via the 
introduction of indicator variables in the tradition of Bayesian variable selection. It is this 
approach that is outlined here, although readers are referred to Smith et al. (2010) for a full 
exposition. 

The objective of Bayesian selection for a vine copula is to identify component pair- 
copulas that are equal to the bivariate independence copula. Recall that the bivariate in- 
dependence copula has copula function C{ui,U2) = U1U2, and corresponding copula density 
c(mi,M2) = dC{ui,U2)/duidu2 = 1. This leads to a parsimonious representation because the 
independence copula is not a function of any parameters. 

For the D-vine with copula density at Equation (12. 7p . Bayesian selection introduces 
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indicator variables 7 = {7*,^; t = 2, . . . ,m, s < t}, where 



Ct,s{'^uU2) = { ^ , , , ^1^*'' ° • (3-10) 

Ct,s{ui,U2;(pt,s) it 7t,s = 1 

In the above, c*^ is a pre-specified bivariate copula density with parameter (pt^s^ The 
copula type can vary with (t,s), but for simplicity only the case where c* g{ui,U2; 4>t,s) = 
c*{ui, U2; (pt,s) is considered here. That is, each pair-copula Ct^s is either an independence cop- 
ula, or a bivariate copula of the same form for all pair-copulas, but with differing parameter 
values. From Equation (12. 6 p it follows that when Ct,s{ui,U2) = 1, f{ut,Us\ut-i, . . . ,Ms+i) = 
f{ut\ut-^i, . . . ,Ms+i)x f(us\ut-i, . . . ,Us+i), so that there is conditional independence between 
Ut and Us- 

The pre-specified bivariate copula can nest the independence copula, so that there exists 
a value such that c*{ui,U2] (f)'^) = 1. In this case, the condition at Equation f l3.10p can 
be rewritten as Ct^s{ui,U2) = c*{ui,U2; 4>t,s), with (j)t,s = 0"*" iff 7t,s = 0. One example of such 
a copula is the Gumbel when 0"*" = 1, which is easily seen by substituting the value into the 
copula density, as given in Table [H 

To estimate the joint posterior latent variables (j)t,s, for t = 2, . . . ,m, s < t, 

are introduced such that (f)t,s = (t)t,s if lt,s = 1- As with the partial correlations in the 
previous section, (pt^g is known exactly given the pair {4>t,s,lt,s)- Therefore, it is sufficient 
to implement a sampling scheme to evaluate the joint posterior 7, where = 

{4>t,s] t = 2, . . . , m, s < t}, as below. 

Sampling Scheme: (Bayesian Selection for a D-vine Copula) 
Generate from f{9j\{Q\9j}, 0, y) for j = 1, . . . , m. 



Step 1 
Step 2 
Step 3 



Generate from /(0t,s, 7t,s|0, {0\0t, J, {7\7t,J, y) ioi t = 2, . . . ,m, s < t. 
Compute from (0,7). 



Generating the marginal parameters 6j in Step 1 is undertaken using the same MH step 

■^Note that this parameter is often a scalar, such as for an Archimedean or bivariate Gaussian copula. 
However, it can also be a vector, as in the case of a bivariate t copula where both the degrees of freedom 



and correlation are parameters. 
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outlined in Section I3.H but where the conditional posterior is now 

^ [Ylc{ui;4))fj{yij]9 



.1=1 



In the above, c{ui] (p) is the D-vine copula density at Equation (12.71) . evaluated at observation 
Ui = {Fi{yii; 6*1), ... , Fm{yim', ^m))E The algorithm in Section 12.61 is run separately for each 
observation Ui to evaluate the arguments of the component pair-copulas of c{ui](f)). Inter- 
estingly, selection can speed up this algorithm substantially because /it,s(Mi|M2; 0t,s) = ^1 if 
7m = 0. 

Generating the pair {4>t,s,lt,s) follows the same MH step outlined in Section for the 
partial correlations. The main difference is that whenever (j)t,s is vector-valued, each element 
is generated separately in the same manner. Also, for many bivariate copulas (particularly 
the Archimedean ones) proper non-uniform priors for 0^ are often preferred. 



3.4 Equivalence of selection for Gaussian and D-vine copulas 

It is worth highlighting here that the Bayesian selection approach for the D-vine nests that 
for the Gaussian copula, when the correlation matrix is parameterised by the semi-partial 
correlations A. If the pair-copula c* is the bivariate Gaussian copula with density at Equa- 
tion (13. 7p . then (f)t,s = ^t,s and (p = A. In this case, the sampling schemes for Bayesian 
selection for D-vine and Gaussian copulas are identical. 



3.5 Posterior inference 

Estimation is based on the Monte Carlo iterates 



{(0W,eW),...,(0W,eW)}, 

obtained from the sampling schemes after convergence to the joint posterior distribution, 
so that (^'-^l, G'-'^) ~ f{(f),<d\y). When Bayesian selection is undertaken, as in Sections 13.21 
and l3.3[ iterates {7'^^, . . . ,7^'^'} are also obtained, with 'j^^''' ~ /(7I?/). Monte Carlo estimates 

"'in the copula literature the n observations . . . , u„} are often called the 'copula data'. 
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of the posterior means can be used as point estimates. For example, the posterior means 



J J 

i=i i=i 

are used as point estimates of the marginal model and copula parameters, respectively. 
Marginal 100(1 — a)% posterior probability intervals can be constructed for any scalar pa- 
rameter by simply ranking the iterates, and then counting off the a J/2 lowest values, and 
the same number of the highest values. 

When undertaking Bayesian selection for a Gaussian copula, the estimates 

can be computed. The former gives the posterior probability that the pair Yt, Yg are inde- 
pendent, conditional on (K^+i, . . . , ^t-i), for s < t. The latter is the posterior mean of the 
semi-partial correlation. At each sweep of the sampling scheme, some elements of At-'l will be 
exactly equal to zero, as determined by The estimate E{T\y) j Ylj=i ^'''^ is therefore 
often called a 'model average' because it is computed by averaging over these configurations 
of zero and non-zero semi-partial correlations in 

Similar estimates can be computed when undertaking Bayesian selection for D-vine cop- 
ulas. When the form of the component pair-copulas nests the independence copula, so that 
copula density c*(mi, M2; 0'*') = 1, then it is possible to compute the posterior mean of the 
pair-copula parameters as E{(j)t,s\y) ~ 'jJ2j=i^t!sj because (p'f^^ = (f)~^ when 'j^j, = 0. How- 
ever, when the pair-copulas do not nest the independence copula, (f)t,s is undefined when 
7t,s = 0. 

If the measures of pairwise dependence discussed in Section 12.71 have a closed form ex- 
pression (or an accurate numerical approximation), then Monte Carlo estimates are straight- 
forward to compute. For example, the estimate of Kendall's tau for continuous valued data 
is 

Posterior probability intervals are constructed using the iterates {Ti^k{4>^^^) • • • , Tj,fc(0''^')} in 
the same manner as for the model parameters. If the pairwise dependence measures are 
difficult to compute, then Kendall's tau and Spearman's rho can be obtained by evaluating 
the expectations at Equation (12. 8p via simulation as follows. At the end of each sweep of 
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a sampling scheme, generate an iterate from the copula distribution U^^^ ~ C{u](j)^^)^ and 
then compute 

E{C^,,{U. Uu)) ^jfl C^k(U^\u^^) , and Em,) ^ ^fl ^"'f^l^'^ • 

Simulating from most copula distributions is straightforward and fast; see Cherubini, Luciano 
and Vecchiato (2004; Chap.6). 



4 Bayesian Inference for Discrete Margins 

Estimation of copula models with one or more discrete marginal distributions differs substan- 
tially from those with continuous margins; see Genest and Neslehova (2007) for an extensive 
discussion on the differences. In this section, the case where all margins are discrete is consid- 
ered, although extension to the case where some margins are discrete and others continuous 
is discussed in Smith and Khaled (2011). 

The hkelihood of n independent observations y = {yi, . . . , ?/„}, each distributed as Equa- 
tion (12. ip and with probability mass function at Equation (12. 5p . is 

n 

L(e, 0) = n At::l ■ • • ^:.Civ; 0) . (4.1) 

i=l 

Here, v = {vi, . . . ,Vm) are indices of differencing, each observation yi = {yn, . . . ,yim), the 
upper bound bij = Fj{yij; 9j), and the lower bound aij = Fj{y~j; 6j) is the left-hand limit of 
Fj at yij. In general, computing the likelihood involves 0{n2^) evaluations of C, which is 
prohibitive for high m. Moreover, even for low values of m, it can be difficult to maximise 
the likelihood for some copula and/or marginal model choices. 

An alternative is to augment the likelihood with latent variables, and integrate them 
out in a Monte Carlo fashion. From a Bayesian perspective this involves evaluating the 
augmented posterior distribution by MCMC methods; an approach that is called Bayesian 
data augmentation (Tanner and Wong 1987). Smith and Khaled (2011) discuss how this can 
be undertaken by augmenting the posterior distribution with latent variables distributed 
as f/ = {Ui, . . . ,Um) ~ C{u;(f)). While their approach applies to all parametric copula 
functions, in the specific case of a copula constructed by inversion as at Equation (12. 2p . 
latent variables distributed as X ~ G{x; 0), can also be used. Pitt, Chan and Kohn (2006) 
propose this to estimate Gaussian copula models, and Smith, Can and Kohn (2010b) when 
G is the distribution function of the skew t of Sahu, Dey and Branco (2003). 
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4.1 The Gaussian copula model 



For the Gaussian copula, latent variables x = are introduced, where Xi = 

(xji, • • • , Xim) ~ N{0, r). The augmented likelihood is L{Q, T, x) = YYi=i fiVii Xi\Q, F), with 
mixed joint density 



Here, /Ar(x; /i, V) is the density of a A^(/i, V) distribution evaluated at x, I{Z) is an indicator 
function equal to one if Z is true, and zero otherwise. The mass function 



where Aij = $]^^(ajj;l) and Bij = ^-^^^{bij;!) as noted in Section [275| and <l'i(-; 1) is the 
distribution function of a standard normal. 

The likelihood of the copula model in Equation (14. ip is obtained by integrating over 
the latent variables, with L(B,F) = J L(0,F,x)dx. Let = {xij, . . . ,Xnj} be the latent 
variables corresponding to the jth margin, then the following sampling scheme can be used 
to evaluate the augmented posterior. 

Sampling Scheme: (Data Augmentation for a Gaussian Copula) 

Step 1: For j = 1, . . . , m: 

1(a) Generate from f{6j\{Q\9j},{x\xy)},T,y) 
1(b) Generate from f{x(j)\Q, {x\x(j)}, F, y) 
Step 2: Generate from /(F|9,x). 

Steps 1(a) and 1(b) together produce an iterate from the density 
f{9j,X(^j)\{Q\9j}, {x\x(^j)},T,y). The conditional posterior at Step 1(b) can be derived as 



where fiij and afj are the mean and variance of the conditional distribution of Xij\{xi\xij} 
obtained from the joint distribution Xi ~ A^(0,F). Thus, xq) can be generated element-by- 



f{yi,Xi\Q,T) 



pr(r = yi\xi, 0)/iv(xi; 0, F) 





f{x^j)\Q,{x\x(^j)},T,y) oc L{Q,T,x) 



oc 
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element from independent constrained normal densities. In Step 1(a), 9.j is generated using 
the same MH approach as in the continuous case, but where the conditional density is now 

/(e,|{e\0,},{x\x(,)},r,y) « {\{^, (^^'^'i) (^^"^'0) 

In Step 2 any of the existing methods for generating a correlation matrix F from its 
posterior distribution for Gaussian distributed data x can be used, as outlined in Section [3?n 
Bayesian selection ideas can also be used as discussed in Section 13.21 

Pitt, Chan and Kohn (2006) demonstrate the efficiency of this sampling scheme empir- 
ically, and Danaher and Smith (2011) show it can be applied effectively to a problem with 
m = 45 dimensions. Smith and Khaled (2011) propose alternative sampling schemes that 
can be used with the Gaussian copula, or with other copula models. 

4.2 Measuring dependence 

For continuous multivariate data, dependence between elements of Y is captured fully by 
the copula function C . In this case, the measures of dependence based on C discussed in 
Section 12.71 are adequate summaries. But when one or more margins are discrete- valued, in 
general, measures of concordance involve the marginal distributions; see Denuit and Lam- 
bert (2005), and Neslehova (2007). Nevertheless, the dependence structure of the latent 
vector U (or the latent vector X for copulas constructed by inversion) is still informative 
concerning the level and type of dependence in the data. Moreover, estimation using non- 
parametric rank-based estimators becomes inaccurate (Genest and Neslehova 2007) and 
likelihood-based inference, such as that outlined here, preferable. 

4.3 Link with multivariate probit and latent variable models 

Last, it is not widely appreciated that the multivariate probit model is a special case of the 
Gaussian copula model with univariate probit margins (Song 2000). Data augmentation for 
a Gaussian copula therefore extends the approaches of Chib and Greenberg (1998), Edwards 
and AUenby (2003) and others for data augmentation for a multivariate probit model, to 
other Gaussian copula models. Similarly, the approach generalises a number of Gaussian 
latent variable models for ordinal data, such as that of Chib and Winkelmann (2001) and 
Kottas, Miiller and Quintana (2005). 
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5 Discussion 



The impact of copula modelling in multivariate analysis has been substantial in many fields. 
Yet, Bayesian inferential methods have been employed by only a few empirical analysts to 
date. Nevertheless, they show great potential for computing efficient fikelihood-based infer- 
ence in a number of of contexts. One of these is in the modelling of multivariate discrete 
data, or data with a combination of discrete and continuous margins. Here, method of mo- 
ments style estimators cannot be used effectively, and there can be computational difficulties 
in maximising the likelihood, so that Bayesian data augmentation becomes attractive; see 
Smith and Khaled (2011) for a full discussion. Another is in the use of hierarchical mod- 
els, including varying parameter models (Ausin and Lopes 2010) or hierarchical models for 
Bayesian selection and model averaging, as discussed here. Last, while this article has fo- 
cused on the Gaussian and D-vine copulas, the Bayesian methods and ideas discussed here 
are applicable to a wide range of other copula models, and it seems likely that their usage 
will increase in the near future. 
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